Search CORE

1,432 research outputs found

Performance analysis of a parallel, multi-node pipeline for DNA sequencing

Author: A Hatem
A McKenna
D Decap
GA Van der Auwera
H Li
H Li
J Dean
MA Depristo
ST Sherry
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Post-sequencing DNA analysis typically consists of read mapping followed by variant calling and is very time-consuming, even on a multi-core machine. Recently, we proposed Halvade, a parallel, multi-node implementation of a DNA sequencing pipeline according to the GATK Best Practices recommendations. The MapReduce programming model is used to distribute the workload among different workers. In this paper, we study the impact of different hardware configurations on the performance of Halvade. Benchmarks indicate that especially the lack of good multithreading capabilities in the existing tools (BWA, SAMtools, Picard, GATK) cause suboptimal scaling behavior. We demonstrate that it is possible to circumvent this bottleneck by using multiprocessing on high-memory machines rather than using multithreading. Using a 15-node cluster with 360 CPU cores in total, this results in a runtime of 1 h 31 min. Compared to a single-threaded runtime of similar to 12 days, this corresponds to an overall parallel efficiency of 53%

Crossref

Ghent University Academic Bibliography

Illuminating Choices for Library Prep: A Comparison of Library Preparation Methods for Whole Genome Sequencing of Cryptococcus neoformans Using Illumina HiSeq.

Author: A Adey
A McKenna
BJ Loftus
EL van Dijk
GA Van der Auwera
H Li
H Li
H Li
J Dabney
JD McPherson
Johanna Rhodes
Kirsten Nielsen
L DeFrancesco
M Eisenstein
MA DePristo
MA Quail
Mathew A. Beale
Matthew C. Fisher
R Marine
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 24/10/2014
Field of study

The industry of next-generation sequencing is constantly evolving, with novel library preparation methods and new sequencing machines being released by the major sequencing technology companies annually. The Illumina TruSeq v2 library preparation method was the most widely used kit and the market leader; however, it has now been discontinued, and in 2013 was replaced by the TruSeq Nano and TruSeq PCR-free methods, leaving a gap in knowledge regarding which is the most appropriate library preparation method to use. Here, we used isolates from the pathogenic fungi Cryptococcus neoformans var. grubii and sequenced them using the existing TruSeq DNA v2 kit (Illumina), along with two new kits: the TruSeq Nano DNA kit (Illumina) and the NEBNext Ultra DNA kit (New England Biolabs) to provide a comparison. Compared to the original TruSeq DNA v2 kit, both newer kits gave equivalent or better sequencing data, with increased coverage. When comparing the two newer kits, we found little difference in cost and workflow, with the NEBNext Ultra both slightly cheaper and faster than the TruSeq Nano. However, the quality of data generated using the TruSeq Nano DNA kit was superior due to higher coverage at regions of low GC content, and more SNPs identified. Researchers should therefore evaluate their resources and the type of application (and hence data quality) being considered when ultimately deciding on which library prep method to use

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Spiral - Imperial College Digital Repository

St George's Online Research Archive

Epistasis not needed to explain low dN/dS

Author: AL Halpern
AS Kondrashov
AU Tamuri
David M. McCandlish
DM Fowler
Etienne Rajon
J da Silva
Joshua B. Plotkin
MA DePristo
MLM Salverda
MS Breen
N Rodrigue
Premal Shah
S Kryazhimskiy
SC Choi
TF Hansen
WH Li
Yang Ding
Z Yang
Z Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/12/2012
Field of study

An important question in molecular evolution is whether an amino acid that occurs at a given position makes an independent contribution to fitness, or whether its effect depends on the state of other loci in the organism's genome, a phenomenon known as epistasis. In a recent letter to Nature, Breen et al. (2012) argued that epistasis must be "pervasive throughout protein evolution" because the observed ratio between the per-site rates of non-synonymous and synonymous substitutions (dN/dS) is much lower than would be expected in the absence of epistasis. However, when calculating the expected dN/dS ratio in the absence of epistasis, Breen et al. assumed that all amino acids observed in a protein alignment at any particular position have equal fitness. Here, we relax this unrealistic assumption and show that any dN/dS value can in principle be achieved at a site, without epistasis. Furthermore, for all nuclear and chloroplast genes in the Breen et al. dataset, we show that the observed dN/dS values and the observed patterns of amino acid diversity at each site are jointly consistent with a non-epistatic model of protein evolution.Comment: This manuscript is in response to "Epistasis as the primary factor in molecular evolution" by Breen et al. Nature 490, 535-538 (2012

arXiv.org e-Print Archive

Crossref

Cold Spring Harbor Laboratory Institutional Repository

INRIA a CCSD electronic archive server

HAL Descartes

Probing Evolutionary Repeatability: Neutral and Double Changes and the Predictability of Evolutionary Adaptation

Author: Scott William Roy
Debbie Fox
DM Weinreich
WP Stemmer
MA DePristo
JH Gillespie
HA Orr
CO Wilke
E Van Nimwegen
M Kimura
DM Weinreich
Publication venue: Public Library of Science
Publication date: 01/01/1996
Field of study

The question of how organisms adapt is among the most fundamental in evolutionary biology. Two recent studies investigated the evolution of Escherichia coli in response to challenge with the antibiotic cefotaxime. Studying five mutations in the beta-lactamase gene that together confer significant antibiotic resistance, the authors showed a complex fitness landscape that greatly constrained the identity and order of intermediates leading from the initial wildtype genotype to the final resistant genotype. Out of 18 billion possible orders of single mutations leading from non-resistant to fully-resistant form, they found that only 27 (1.5x10(-7)%) pathways were characterized by consistently increasing resistance, thus only a tiny fraction of possible paths are accessible by positive selection. I further explore these data in several ways.Allowing neutral changes (those that do not affect resistance) increases the number of accessible pathways considerably, from 27 to 629. Allowing multiple simultaneous mutations also greatly increases the number of accessible pathways. Allowing a single case of double mutation to occur along a pathway increases the number of pathways from 27 to 259, and allowing arbitrarily many pairs of simultaneous changes increases the number of possible pathways by more than 100 fold, to 4800. I introduce the metric 'repeatability,' the probability that two random trials will proceed via the exact same pathway. In general, I find that while the total number of accessible pathways is dramatically affected by allowing neutral or double mutations, the overall evolutionary repeatability is generally much less affected.These results probe the conceivable pathways available to evolution. Even when many of the assumptions of the analysis of Weinreich et al. (2006) are relaxed, I find that evolution to more highly cefotaxime resistant beta-lactamase proteins is still highly repeatable

Public Library of Science (PLOS)

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Directory of Open Access Journals

PubMed Central

University of Groningen Digital Archive

Dissertations of the University of Groningen

SNP discovery in apple cultivars using next generation sequencing

Author: F Denardi
F Denardi
Georgios Pappas
H Li
H Li
Luís Fernando Revers
MA Depristo
Marcos Costa
Orzenil Silva-Junior
R Velasco
Roberto Togawa
Sérgio Alencar
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Quantifying single nucleotide variant detection sensitivity in exome sequencing

Author: A McKenna
AJ Coffey
Alison M Meynert
AM Sulonen
Andrew P Jackson
B Lehne
B Timmermann
DN Cooper
E Kalay
H Li
H Li
J Parla
JF Degner
JK Teer
K Fransen
KK Mantripragada
Louise S Bicknell
M Choi
MA Depristo
Martin S Taylor
Matthew E Hurles
MD Mailman
MJ Clark
MN Bainbridge
MW Hahn
R Leinonen
RA Harte
RE Thurman
SB Ng
SB Ng
SB Ng
SS Ajay
The International HapMap 3 Consortium
Y Li
Y Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

BACKGROUND: The targeted capture and sequencing of genomic regions has rapidly demonstrated its utility in genetic studies. Inherent in this technology is considerable heterogeneity of target coverage and this is expected to systematically impact our sensitivity to detect genuine polymorphisms. To fully interpret the polymorphisms identified in a genetic study it is often essential to both detect polymorphisms and to understand where and with what probability real polymorphisms may have been missed. RESULTS: Using down-sampling of 30 deeply sequenced exomes and a set of gold-standard single nucleotide variant (SNV) genotype calls for each sample, we developed an empirical model relating the read depth at a polymorphic site to the probability of calling the correct genotype at that site. We find that measured sensitivity in SNV detection is substantially worse than that predicted from the naive expectation of sampling from a binomial. This calibrated model allows us to produce single nucleotide resolution SNV sensitivity estimates which can be merged to give summary sensitivity measures for any arbitrary partition of the target sequences (nucleotide, exon, gene, pathway, exome). These metrics are directly comparable between platforms and can be combined between samples to give “power estimates” for an entire study. We estimate a local read depth of 13X is required to detect the alleles and genotype of a heterozygous SNV 95% of the time, but only 3X for a homozygous SNV. At a mean on-target read depth of 20X, commonly used for rare disease exome sequencing studies, we predict 5–15% of heterozygous and 1–4% of homozygous SNVs in the targeted regions will be missed. CONCLUSIONS: Non-reference alleles in the heterozygote state have a high chance of being missed when commonly applied read coverage thresholds are used despite the widely held assumption that there is good polymorphism detection at these coverage levels. Such alleles are likely to be of functional importance in population based studies of rare diseases, somatic mutations in cancer and explaining the “missing heritability” of quantitative traits

Crossref

Springer - Publisher Connector

PubMed Central

Edinburgh Research Explorer

Gene expression drives the evolution of dominance.

Author: A Durvasula
A Platt
AF Agrawal
B Charlesworth
BM Henn
BY Kim
CD Huber
CD Huber
D Enard
D Ortega-Del Vecchyo
D Szklarczyk
DJ Balick
F Gao
F Manna
FH Shaw
H Kacser
HA Orr
I Frumkin
J Yang
JBS Haldane
JS Sanjak
KE Lohmueller
KM Teshima
LD Hurst
MA DePristo
MJ Simmons
N Phadnis
P Cingolani
P Lamesch
PY Novikova
RA Fisher
RD Hernandez
RN Gutenkunst
S Glémin
S Ossowski
S Williamson
S Wright
SH Williamson
T Bedford
T Kawakatsu
T Mukai
TI Gossmann
TT Hu
X Zheng
YB Simons
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

Dominance is a fundamental concept in molecular genetics and has implications for understanding patterns of genetic variation, evolution, and complex traits. However, despite its importance, the degree of dominance in natural populations is poorly quantified. Here, we leverage multiple mating systems in natural populations of Arabidopsis to co-estimate the distribution of fitness effects and dominance coefficients of new amino acid changing mutations. We find that more deleterious mutations are more likely to be recessive than less deleterious mutations. Further, this pattern holds across gene categories, but varies with the connectivity and expression patterns of genes. Our work argues that dominance arises as a consequence of the functional importance of genes and their optimal expression levels

Crossref

eScholarship - University of California

Purifying Selection in Deeply Conserved Human Enhancers Is More Consistent than in Coding Sequences

Author: A Eyre-Walker
A Kasprzyk
A Siepel
A Todorova
A Woolfe
A Woolfe
AB Singleton
AL Hughes
AR Boyko
Arnar Palsson
AS Ethayathulla
D Boffelli
DA Tagle
DG Torgerson
Dilrini R. De Silva
DJ Epstein
DL Halligan
E Berezikov
F Butter
G Bejerano
G Elgar
G Piganeau
G Piganeau
GD Stormo
GG Loots
GK McEwen
GR Abecasis
GR Abecasis
GR Ritchie
Greg Elgar
H Li
HJ Parker
I Dubchak
I Keller
IH Consortium
JA Drake
JJ Cai
JM Bras
K Tamura
LA Lettice
M Claussnitzer
M Kasowski
M Spivakov
MA Antezana
MA DePristo
MB Hammer
P Flicek
R McDaniell
R Sachidanandam
RD Dowell
RD Hernandez
Richard Nichols
RJ Guerreiro
S Asthana
S Benko
S Katzman
S Minovitsky
SB Hedges
W McLaren
W Stephan
XJ Mu
YY Teo
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

(c) 2014 De Silva et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Queen Mary Research Online

FigShare

Evolution favors protein mutational robustness in sufficiently large populations

Author: A Meyerhans
A Moya
A Wagner
AA Pakula
Alpan Raval
AM Lesk
BW Matthews
CO Wilke
CO Wilke
CO Wilke
CR Otey
D Posada
DA Drummond
David Chen
DC Krakauer
DJ Lipman
DL Hartl
DM Taverna
DM Taverna
E Bornberg-Bauer
E van Nimwegen
E Zuckerkandl
F Sun
Frances H Arnold
G Piganeau
HH Guo
HJ Barnes
JA Wells
JB Plotkin
JD Bloom
JD Bloom
JD Bloom
JD Bloom
JD Bloom
Jesse D Bloom
JM Smith
JM Smith
JN Franklin
KA Bava
L Serrano
M Kimura
M Lynch
M Vignuzzi
MA DePristo
MA Huynen
N Takahata
OG Berg
Ophelia S Venturelli
PC Cirino
PD Sniegowski
R Godoy-Ruiz
R Montville
RE Lenski
S Bershtein
S Brin
S Shafikhani
T Kanagawa
W Besenmatter
X Wang
XJ Zhang
Y Xia
Zhongyi Lu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/04/2007
Field of study

BACKGROUND: An important question is whether evolution favors properties such as mutational robustness or evolvability that do not directly benefit any individual, but can influence the course of future evolution. Functionally similar proteins can differ substantially in their robustness to mutations and capacity to evolve new functions, but it has remained unclear whether any of these differences might be due to evolutionary selection for these properties. RESULTS: Here we use laboratory experiments to demonstrate that evolution favors protein mutational robustness if the evolving population is sufficiently large. We neutrally evolve cytochrome P450 proteins under identical selection pressures and mutation rates in populations of different sizes, and show that proteins from the larger and thus more polymorphic population tend towards higher mutational robustness. Proteins from the larger population also evolve greater stability, a biophysical property that is known to enhance both mutational robustness and evolvability. The excess mutational robustness and stability is well described by existing mathematical theories, and can be quantitatively related to the way that the proteins occupy their neutral network. CONCLUSIONS: Our work is the first experimental demonstration of the general tendency of evolution to favor mutational robustness and protein stability in highly polymorphic populations. We suggest that this phenomenon may contribute to the mutational robustness and evolvability of viruses and bacteria that exist in large populations

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

A simple data-adaptive probabilistic variant calling model

Author: A McKenna
B Efron
H Li
H Li
H Li
H Xu
J O’Rawe
KE McElroy
Korbinian Strimmer
MA DePristo
Peter F Stadler
S Hoffmann
S Pabinger
Steve Hoffmann
X Liu
X Yu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref